Effective Language Representations for Danmaku Comment Classification in Nicovideo
نویسندگان
چکیده
Danmaku commenting has become popular for co-viewing on video-sharing platforms, such as Nicovideo. However, many irrelevant comments usually contaminate the quality of information provided by videos. Such an pollutant problem can be solved a comment classifier trained with abstention option, which detects whose video categories are unclear. To improve performance this classification task, paper presents Nicovideo-specific language representations. Specifically, we used sentences from Nicopedia, Japanese online encyclopedia entities that possibly appear in Nicovideo contents, to pre-train bidirectional encoder representations Transformers (BERT) model. The resulting model named Nicopedia BERT is then fine-tuned it could determine whether given falls into any predefined categories. experiments conducted data demonstrated effectiveness compared existing models pre-trained using Wikipedia or tweets. We also evaluated each additional sentiment and obtained results implied applicability feature extractor other social media text.
منابع مشابه
Comment on ‘MeSH-up: effective MeSH text classification for
Information retrieval is an important task that requires specific attention in the biomedical domain where controlled vocabularies are available to characterize and organize textual content. A recent article published in Bioinformatics (Trieschnigg et al., 2009) confirms that there is a continued interest in the community to address this problem and achieve ‘improved document retrieval’. As sho...
متن کاملDocument Classification by Inversion of Distributed Language Representations
The goal of this note is to point out that any distributed representation can be turned into a classifier through inversion via Bayes rule. The approach is simple and modular, in that it will work with any language representation whose training can be formulated as optimizing a probability model. In our application to 2 million sentences from Yelp reviews, we also find that it performs as well ...
متن کاملLearning Representations for Relation Classification
Knowledge bases can be applied to a wide variety of tasks such as search and question answering, however they are plagued by the problem of incompleteness. In this project, we propose two models for automated relation classification using extracted entity pairs and related sentences from natural text. We evaluate both models on a portion of the Stanford KBP dataset across 38 relations, achievin...
متن کاملText Representations for Patent Classification
gives a small, but significant, improvement in classification results on the CLEF-IP 2011 corpus, compared with classification on abstracts only. The effort involved in parsing the descriptions is considerable, however: Because of the long sentences and the dense word use, a parser will have much more difficulty in processing text from the description section than from the abstracts. The titles...
متن کاملResponse to comment on 'MeSH-up: effective MeSH text classification for improved document retrieval'
In response to the methodological considerations, we emphasize that in our paper we compare different MeSH classification systems on two tasks: (i) reproducing manual MeSH recommendations (referred to as indexing by Névéol et al.) and (ii) translating a textual query to an additional MeSH representation (referred to as query expansion). We show that the approach we propose works well on both ta...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEICE Transactions on Information and Systems
سال: 2023
ISSN: ['0916-8532', '1745-1361']
DOI: https://doi.org/10.1587/transinf.2022dap0010